home *** CD-ROM | disk | FTP | other *** search
- Frequently Asked Questions (FAQS);faqs.113
-
-
-
- 13. Process accounting is broken
- In 4.0.3, process accounting doesn't work. From examining the accounting
- scripts, it appears that /usr/lib/acct/accton is supposed to set a return code
- depending on whether accounting was switched on already or not. However, it
- always returns the same result - accounting switched off. This means that the
- /usr/lib/acct/ckpacct script, which is run every hour to keep the proccess
- accounting log in check, instead turns off accounting the first time it is run
- after booting. The same happens with the nightly /usr/lib/acct/monacct
- program.
- I don't yet know whether this bug is present in 4.0.4. It is definitely
- un-fixed in Dell 2.1 and Consensys 1.3. In Dell 2.2 the return bug is fixed,
- but accounting isn't automatically enabled at boot time.
-
- 14. tar(1) foos up in the presence of symbolic links
- Tar can get the names of symbolic links wrong when creating an archive.
- This bug can be demonstrated by doing the following:
-
- mkdir t
- cd t
- touch a 1234567890
- ln -s 1234567890 b
- ln -s a c
- tar vcf ../t.tar .
-
- The output generated by tar is:
-
- a ./ 0 tape blocks
- a ./a 0 tape blocks
- a ./1234567890 0 tape blocks
- a ./b symbolic link to 1234567890
- a ./c symbolic link to a234567890
-
- (Note the above commands should be done in the order shown and in a new
- directory) This bug is nasty. Recommended solution: use GNU tar.
- This is reported from Esix 4.0.3 and Consensys 1.3, but probably exists on
- other SVr4s as well.
-
- 15. Symbolic links can interfere with shellscript execution
- There is a problem running #! scripts when symbolic links are involved.
- Typing in the following from a command shell demonstrates the problem:
-
- mkdir a b
- ln -s a c
- cd a
- cat > script <<!
- #!/bin/sh
- echo Hello
- !
- chmod 755 script
- cd ../b
- ln -s ../c/script .
- ./script
-
- The message generated from the last line is:
-
- a/script: a/script: cannot open
-
- This is reported from Esix 4.0.3, Consensys 1.3, and Dell 2.2, but
- probably exists on other SVr4s as well.
-
- 16. Piping a csh builtin causes the shell to hang.
- While running csh, this can be demonstrated by some of the following:
-
- echo Hello | cat
- history | more
-
- (A solution to this one is use tcsh-6.02.)
- This is reported from Esix 4.0.3 and Consensys 1.3, but probably exists on
- other SVr4s as well. It is reported fixed in Dell 2.2.
-
- 17. Quick port setup option in sysadm is broken
- In 4.0.3 sysadm, the quick port setup option, which is used to add and
- delete terminal ports, is seriously broken. The script modifies /etc/conf/*
- files, and has incorrect minor numbers, sets the 5th field of sdevice.d if Y
- when it should be N, and is missing columns for node.d. See
- /usr/sadm/sysadm/bin/q-add.
-
- 18. COFF binaries linked with curses(3) and shared libc hang
- ...eating the CPU. Cause unknown.
-
- 19. shl hangs, sxt devices bad
- shl(1) does not work. Try creating a layer and doing an 'ls'. Your session
- hangs. Bruce Momjian <root%candle.uucp@bts.com>, who reported this bug, says
- he believes it is the sxt devices which are broken. It definitely exists in
- Consensys 1.3.
-
- 20. num-lock prevents mouse from working properly
- When using the Motif window manager, if your num lock is on, your mouse
- clicks are not recognized by the window manager. The mouse still works in
- xterm(1). This is allegedly fixed in Destiny (4.2).
-
- 21. adjtime() doesn't work
- Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 adjtime() doesn't.
- Calling `date -a' works to adjust the time slowly.
-
- 22. ttymon drops DTR
- Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 the ttymon(1)
- utility for HDB uucp drops DTR every few weeks. The workaround is to disable
- and re-enable it.
-
- 23. cron mail doesn't go through aliasing
- Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 cron mail to adm
- doesn't get redirected by the aliases file.
-
- 24. fragility in xterm
- Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6, doing ~! from
- a cu in xterm kills xterm. This has been fixed in Dell 2.2.
-
- 25. csh lossage due to bad optimization
- If a csh user sources a non-existent file in their .cshrc (eg, source .alias,
- where .alias doesn't exist), then the system will hang for a couple of minutes.
- Eventually the user get an "Out of memory" error and the console logs "NOTICE:
- out of swap space - Insufficient memory to allocate 2 pages - system call
- failed".
- This appears to be due to over-optimization of code surrounding a longjmp
- call.
- (There are numerous other reports of memory leak bugs in csh).
-
- 26. Bug in cp(1)
- If ``copy'' encounters a directory before a file, it dumps core ...
-
- --- cut ---
- cd /tmp
- mkdir copybug jnk
- cd jnk
- mkdir directory
- >file
- cp -r * /tmp/copbug
- --- cut ---
-
- This was reported from Consensys 4.0.3 but is probably a generic SVr4 bug.
-
- 27. tbl -me doesn't work
- Wolfgang Denk reports that trying to use "tbl -me" for any input file causes
- tbl to quit. The problem is that newer tbl versions don't accept [nt]roff
- contol lines (".rm @W") after .TS.
-
- 28. who -r fragility leads to boot-time problems
- It coredumps if the name of the timezone is longer than three characters.
- This can be a real problem for European sites... and is potentially more
- hazardous than immediately apparent as _a lot_ of the initialization scripts
- (rc1.d, rc2.d) use ``who -r'' to see if the machine is in single- or multi-user
- mode. And when ``who'' bombs out, the ``set'' command is iven an empty
- command-line and can't do much else than print the shell variables, $1-$9
- remain empty ... meaning that more or less all the scripts fail in various ways
- and the system has an exceptionally hard time coming up.
-
- 29. at(1) breaks here-documents in shell scripts
- at adds gratuitous empty lines to the job submitted by the user.
- This prevents shell here-documents from working.
-
- III. Networking and File Sharing Bugs
-
- 1. NFS locking is unusably slow
- Randy Terbush <randy@dsndata.dsndata.com> has posted code which
- demonstrates a serious bug in the SVr4 NFS locking daemon.
- In his own words: "The symptoms are ~30% cpu usage by 'lockd' and
- severe slowing of the machines on the network. This program
- demonstates that it takes ~20 seconds to obtain locks from an ailing
- 'lockd'. We have verified that this bug does not exist in HPUX 8.0x."
- Randy's code is too large to be included here. He is, quite
- rightly, exercised at USL's exceedingly slow response to this problem.
- The comment in his makefile reads, in part:
-
- # USL has admitted to the existance of this bug in version 4.0, 4.1,
- # and 4.2 of their distributed and yet to be released sources. This is
- # a network crippling problem that they have refused to fix until
- # release 4.3, which will be OVER 1 YEAR from today. (29 Oct 1992)
- # If your version of 'lockd' exhibits this same problem, I would
- # strongly urge you to contact your vendor and ask them to put some
- # pressure on USL to fix this problem. SVR4 is virtually useless in a
- # network of shared resources while this problem exists.
-
- 2. UFS file system problems
- In stock USL 4.0.3, you can't use a UFS file system as the root; the system
- hangs if you try. Consensys, Dell, Esix, Microport, MST, UHC, and ESIX all
- appear to have fixed this.
- David Aitken, the UNIX product manager at UHC, writes "The ufs as root file
- system [problem] was not really a bug, just a little oversight on USL's part -
- we have fixed it completely by adding one line to the /stand/boot script:
- rootfstype=ufs!" He adds that they've been using ufs on their lab machines for
- over 10 months with no trouble, and the latest UHC release defaults to ufs if
- you have more than 120MB of disk.
-
- 3. Byte-order problem with NFS when accessing Sun disks
- Christoph Badura <bad@generics.ka.sub.org> notes that the stock USL resolver
- library suffers from serious confusion about the byte order in the
- socketaddr_in structure. This bug is acknowledged by USL for the 4.0.4
- release. A symptom of this bug is that Sun disks will not mount correctly over
- NFS. As a workaround, try removing the references to /usr/lib/resolv.so from
- /etc/netconfig and rebooting your system. Unfortunately, this will mean
- you can't use nameservers.
- Alan Batie <batie@agora.rain.com> writes: "Actually, you don't have to
- remove resolv.so, just put tcpip.so first and have a hosts file with the names
- of hosts you want to do NFS mounts from. This way you can use nameservers for
- most things."
-
- 4. Under weird circumstances, lseek on UFS may cause corruption
- Christoph Badura <bad@generics.ka.sub.org> reports that a UFS lseek() to an
- offset which is a multiple of 4096 but not a multiple of 8192, followed by a
- write(), may corrupt the file being written. The bug shows up only, if the
- file has no pages in the page pool associated with it at the seek offset and at
- 4k before the seek offset. He has sent USL kernel fix for this, which was
- included in 4.0.4.
-
- 5. FTP problems
- The in.ftpd on SVR4.0.3 does not support all the commands listed in RFC 959.
- When recent SCO UNIX/ODT versions ftp to SVR4.0.3, the SVR4 side will refuse,
- drop the connection, and core dump after you authenticate. This is because the
- SCO end sends the 'SYST' command ala RFC 959, and the SVR4.0.3 end doesn't
- recognise it. Some ports have fixed this.
- Christoph Badura adds: "The bug is do to a longjmp(3) on a sigjmpbuf obtained
- by sigsetjmp(3). ARGH. Testing led to a bug in the original BSD sources, which
- is still present in the NET/2 ftpd. "
-
- 6. A bug in the WD80x3 support
- MST reports a serious bug in the SVr4 kernel support for this card. Here's
- how to reproduce it:
-
- server: init 3 and share (export) /usr for example.
-
- client: mount -F nfs server:/usr /mnt
- cd /mnt
- find . -print | cpio -ocBuv > /dev/null
-
- what happens:
- server and client will "hang" together.
-
- "cue":
- hit keys on server and/or client, hang will go away
- for 10-20 seconds temporarily. Yank BNC connectors
- do the same trick.
-
- They say they've heard from customers that this happens on Dell, UHC as well
- as USL 4.0.4. PCNFS/BWNFS network xcopy suffers this as well. Client can be a
- Sun Sparc for that matter.
-
- IV. SCSI Support Problems
-
- 1. sar is confused by SCSI
- Sar -d doesn't work on SCSI drives. Dell fixed this in 2.1 and it's
- reported to work OK in Esix 4.0.3A; no report of any other SVr4 having fixed
- this yet. SCO fixed it in 3.2.4.
-
- 2. A configuration problem
- Stock USL requires you to jumper your SCSI devices to fixed IDs
- during installation (it can be changed to any other ID after).
- Dell says they've fixed this. The requirement is definitely still present
- in Esix and Consensys 1.3. UHC thinks they've fixed this, but their 4.0.3.6
- release still seems to demand ID 1 to install.
-
- 3. Synchronous SCSI hang problem
- David Wexelblat <dwex@mtgzfs3.att.com> reports: "Stock SVR4.0.3 will hang
- the SCSI bus with a 1542 in synchronous mode. Dell fixed this, and this has
- been given to Microport [ed note: Microport 4.0.4 and Consensys 4.0.3 have
- fixed the problem; MST UNIX and Esix 4.0.3 still have this problem; I have not
- yet been able to determine if ESIX 4.0.4 does]. In the file /sbin/bcheckrc,
- change the line:
-
- echo MARK > /dev/rswap
-
- to
-
- echo MARK | dd of=/dev/rswap bs=512 conv=sync > /dev/null 2>&1
-
- The magic is apparently the conv=sync, which forces a 512 byte block
- to be written. The original echo writes 4 bytes, which apparently causes
- synchronous SCSI to go out to lunch.
-
- Now, you ask, how can I fix this, since the system won't boot? There are
- a couple of methods. First, if possible, disable synchronous negotiation
- (1542 jumper J5-1 removed, plus whatever you may need to do to your drive).
- Then boot up, edit /sbin/bcheckrc, then shutdown, restrap for synchronous,
- then reboot. Everything should be OK.
-
- That's the easy way. Unfortunately, some hard drives will only work
- in synchronous mode. Well, you can still recover from this phenomenon.
- Here's how:
-
- 1) Install on your hard drive
- 2) Boot from the first boot floppy. When it tells you to, insert
- the second boot floppy. At the first prompt, hit <DEL> to
- break out to a shell.
- 3) Mount your hard drive under /mnt with the following command
- (replace FS-TYPE with s5, s52, or ufs, whichever you used for
- for your root partition):
-
- /etc/fs/FS-TYPE/mount /dev/dsk/c0t0d0s1 /mnt
-
- 4) Now edit /mnt/sbin/bcheckrc:
-
- ed /mnt/sbin/bcheckrc
-
- You may want the 'ed' man page handy (I barely remember how to
- to use 'ed' :->). For simplicity, you can delete/comment out
- the offending line, then replace it with the correct line later.
- 5) Unmount the hard drive:
-
- umount /mnt
-
- 6) Reboot from the hard drive. Everything should come up OK. and
- you can finish editing /sbin/bcheckrc, if necessary.
-
- Note that you perform these actions at your own risk. The first version was
- performed by me on Microport SVR4, and the second was performed by someone
- else (on my suggestion) on ESIX SVR4."
- This problem appears to be fixed on Consensys 1.3 and Dell 2.1.
-
- 4. ps chokes on commands that do SCSI I/O
- Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6, ps
- doesn't work when a SCSI command in progress. It stops printing at the
- process executing the scsi command.
- This is still broken in Dell 2.2.
-
- 5. Transfer speed problems with Adaptec 1542B on 486s
- If a system mount or install fails, try setting the DMA speed to 5MB/s,
- rather than the default 5.7MB/s. This is accomplished by removing the jumper
- shorting the 12th pin pair of jumper block 5.
-
- V. Development Tools Problems
-
- 1. General UCB library brokenness
- The BSD compatibility libraries were badly broken in USL code. A Dell
- source adds "That meant that almost all the apps derived from them were broken
- too. Most stuff like automount will die when you send a SIGHUP, instead of
- rereading the map file. You can get a system into very strange states when
- that happens."
- John Sully <jms@mport> of Microport opines: "This is a bug in automount
- itself rather than BSD compatibility, since the automount which comes with SVR4
- is not compiled with the BSD libraries. (isn't this comforting?? :-()."
-
- Esix and UHC's BSD libraries are USL stock. I don't yet know
- the status of other ports. Microport has run into things they think may be
- symptoms of this but have no fix yet.
-
- John Sully <jms@mport> of Microport counters with: "One common thread I find
- on reading of these problems is that the BSD compatibility libraries are
- *misused*. [...] The problem is that BSD and SYSV have similarly named .h files
- which sometimes contain different definitions for objects with the same name.
- This has been known to cause all sorts of problems because the SYSV headers are
- picked up and then the calls are satisfied from the BSD library rather than the
- shared object library. I have found that if you use /usr/ucb/cc that the BSD
- compatibility is much less broken than it would seem at first because it
- ensures that the correct headers are picked up."
-
- However, note that there is at least one *real* bug known --- as of 4.0.4
- the signal emulation cannot explicitly set a handler to SIG_DFL or SIG_IGN.
-
- Ron Guilmette <rfg@ncd.com> writes "[Library lossage] may be easily
- demonstrated by attempting to build and link the GNU C compiler with
- `-L/usr/ucblib -lucb'. The resulting compiler will most certainly
- crash and die." John Sully thinks this is because the /usr/ucb/cc
- compiler should have been used, but wasn't.
-
- 2. USL emulation of BSD signals doesn't work
- A different source reports that the the USL implementatation of BSD signals
- is broken in both 4.0.3 and 4.0.4; in particular, the sigvec() family doesn't
- work properly. It is possible to make minor tweaks to source to make such apps
- work properly with the native USL signals implementation.
-
- Here's more on the signals problem, thanks to Richard <rc@siesoft.co.uk>:
- ------------------------------------------------------------------------------
- The problem is to do with the signal() function that is within the BSD
- compatability libc.
-
- To reproduce the problem do the following:
-
- #include <stdio.h>
- #include <sys/types.h>
- #include <signal.h>
- #include <sys/siginfo.h>
-
- main()
- {
- signal(SIGPIPE,SIG_IGN);
- pause();
- }
-
- and compile it with cc xx.c -o xx /usr/ucblib/libucb.a
-
- (John Sully observes that this is definitely wrong; /usr/ucb/cc should have
- been used rather than "cc ... -L/usr/ucblib -lucb" or the equivalent "cc ...
- /usr/ucblib/libucb.a".)
-
- If you run the program and then signal it with a SIGPIPE, the program
- will die, even though you've told it to ignore SIGPIPE.
-
- The fix is difficult unless you've got source because there's a missing 'else'
- clause from the signal() code. This is the only signal fault I've found in
- the BSD signal functions, details of the rumoured sigvec problem would be
- useful?
-
- If you're trying to compile an application you could change the application
- code to do the following, this does work..
-
- void
- catch(s)
- int s;
- {
- /* DO NOTHING */
- ;
- }
-
- main()
- {
- signal(SIGPIPE,catch);
- pause();
- }
-
- SUMMARY
- You can only change a signal handler to a function handler, any number of
- times. Any attempt to set the handler to SIG_DFL, or SIG_IGN will fail.
-
- This bug has given some people working with X11R5 aggro, causing the X server
- to die when you close a client.
-
- Christoph Badura <bad@flatlin.ka.sub.org> confirms this bug
- He has sent USL a source fix. It appears already to have been fixed in Dell
- 2.2.
- ------------------------------------------------------------------------------
-
- 3. Possible string library problems
- There are also persistent rumors of problems in the BSD-emulation string
- libraries. I have not been able to pin down specifics on this.
-
- 4. USL's ndbm support is broken.
- Christoph Badura <bad@generics.ka.sub.org> reports "The ndbm functions in
- the ucb library are broken [apparently due to a compiler of optimizer bug in cc
- -- ed.]. Try makeing the whatis data base for /usr/share/man with Tom
- Christiansen's perl rewrite of man.
- The easiest way to fix this is to compile GNU's replacement ndbm.c with gcc
- -fpcc-struct-return -traditional (gcc1.40 or 2.2 will do nicely) and install it
- in your C library. Source is available for FTP from prep.ai.mit.edu.
-
- 5. An include file is missing
- Both 4.0.3 and 4.0.4 USL versions are missing the documented dial.h
- file from their /usr/include directory. Dell 2.1 has it.
-
- 6. sscanf(3) has a potential bug
- Anthony Shipman <als@bohra.cpg.oz.au> reports: " I found the following bug
- in SCO Unix 3.2.* and I think it may be common to many AT&T derived Unixes.
-
- sscanf() calls _doscan() to read from a pretend file. The file
- uses the string as a buffer and a fake file descriptor of 60 (=_NFILE).
- Since _NFILE (for SCO UNIX) is 60 it assumes that fd 60 can never be open.
-
- Then when fscanf() hits the end of the string it calls _filbuf() to read
- into the buffer (which is the string) from fd 60. This should fail with
- an errno=9 and then _filbuf() sets EOF and it all terminates.
-
- However in SCO Unix you can reconfigure the kernel to increase the number
- of files per process to a recommended maximum of 150. If you do this then
- your program might have fd 60 open one day. Then sscanf() will read from this
- file overwriting your string. The byte count to the read() in _filbuf()
- is some undefined but large value so a lot of memory will be overwritten. In
- my case the string was on the stack so my stack was wiped.
-
- In short if you configure your kernel to have NOFILES > _NFILE ie more than
- the default then sscanf() is a time bomb in your code."
-
- 7. Compiler problems
- Ronald Guilmette <rfg@ncd.com> also reports the following:
-
- ------------------------------------------------------------------------------
- /* Here is a bug in the original SVR4 C compiler (aka C Issue 5) which
- effectively prevents you from making good use of the `const' and
- `volatile' qualifiers defined by ANSI C in conjunction with pointer
- types and typedef statements. Compile this code and you will get:
-
- "qualifiers.c", line 23: left operand must be modifiable lvalue: op "="
-
- ...if your copy of the svr4 C compiler still has the bug. Note that
- given these declarations, the ANSI C standard say that the thing pointed
- to by the variable `pci' should be considered to be constant... not the
- variable `pci' itself. (The GCC compiler, either version 1.x or version
- 2.x, correctly compiles this example without complaint.)
- */
-
- typedef const int *ptr_to_const_int;
-
- ptr_to_const_int pci;
-
- int i;
-
- void main ()
- {
- pci = &i;
- }
- ------------------------------------------------------------------------------
- /* Here is a subtle bug in the original SVR4 C compiler (aka C Issue 5)
- which prevents you from first declaring a tagged type (i.e. a struct
- type or a union type) in a parameter list, and then defining that tagged
- type later on within the same scope. (Note that according to the ANSI C
- standard, the scope in which parameters get declared and the outermost
- block of a function body are one and the same scope. Thus, this really
- is legal ANSI C code!)
-
- Try compiling this with your C compiler on SVR4. If your compiler still
- has the bug, you will get:
-
- "tagged_type.c", line 24: warning: dubious tag declaration: struct S
- "tagged_type.c", line 28: warning: improper member use: i
- "tagged_type.c", line 28: warning: improper member use: i
- "tagged_type.c", line 31: warning: dubious tag declaration: struct S
- "tagged_type.c", line 35: warning: improper member use: i
- "tagged_type.c", line 35: warning: improper member use: i
-
- (The GCC compiler also had this bug in version 1.x, but it has been fixed
- in version 2.x.)
- */
-
- void foobar1 (arg) /* use old-style without prototypes */
- struct S *arg;
- {
- struct S { int i; }; /* define the type `struct S' */
-
- arg->i = arg->i; /* legal according to ANSI C rules! */
- }
-
- void foobar2 (struct S *arg) /* use new-style with prototypes */
- {
- struct S { int i; }; /* define the type `struct S' */
-
- arg->i = arg->i; /* legal according to ANSI C rules! */
- }
- ------------------------------------------------------------------------------
- /* Here is a serious bug in the original SVR4 `dump' program which dumps
- out parts of object files in either plain hex form or symbolically.
-
- To see the `dump' program get a segfault and die, save this code under
- the name `dump-bug.c' and then do:
-
- cc -g -c dump-bug.c
- dump -v -D dump-bug.o
-
- The bug arises whenever `dump' tries to read Dwarf debugging information
- for an array of pointers to any "user defined" type (e.g. `struct S' in
- this example). Past that point, `dump' is totally confused, so further
- Dwarf debugging information finally causes it to go belly-up.
- */
-
- struct S { int i; };
- struct S *array[10];
- int j;
- ------------------------------------------------------------------------------
- It appears that the svr4 C compiler (for x86 machines) doesn't conform real
- well to either the letter or the spirit of the IEEE 754 floating-point
- standard. In particular, "unordered comparisons" and other operations on
- NaNs don't always produce the result that that the IEEE 754 standard calls
- for.
-
- An AT&T source comments: "This is documented in the SVID as a future direction.
- We do not support NaNs in -Xa and -Xt modes, only in -Xc. Try
- isnan(sqrt(-1.0)) to determine which modes support it."
- ------------------------------------------------------------------------------
-
- The compiler fails to issue diagnostics for cases where a floating point
- literal is given which exceeds the range of its type (either float or
- double). Actually this one could be argued either way, since IEEE FP
- format includes "infinities" and the compiler probably just changes any
- FP value which is out of range for its type into either positive infinity
- or negative infinity (as appropriate).
-
- The compiler fails to issue diagnostics in cases where a typedef name is
- reused to declare a formal parameter, as in:
-
- -----------------------------------------------------------------------
- typedef int FOO;
- void bar (FOO)
- int FOO;
- {
- }
- -----------------------------------------------------------------------
-
- The compiler crashes on the following invalid input:
-
- -----------------------------------------------------------------------
- int i;
- volatile void *pvv;
-
- void pvv_test ()
- {
- (i ? *pvv : *pvv); /* ERROR */
- }
- -----------------------------------------------------------------------
-
- The compiler fails to issue diagnostics for cases where an attempt is
- made to "forward declare" an enum type (without also defining it), as
- in:
-
- -----------------------------------------------------------------------
- enum enum0 *ep; /* ERROR */
- -----------------------------------------------------------------------
-
- The compiler rejects the following code with an error, although there
- seems to be no good reason why it should (because no object is being
- declared).
-
- -----------------------------------------------------------------------
- #include <limits.h>
-
- typedef char array_type[ULONG_MAX];
- -----------------------------------------------------------------------
-
- VI. The FUBYTE Problem
-
- (Thanks to Christoph Badura <bad@flatlin.ka.sub.org> for this info)
-
- The kernel function fubyte() is documented to return a positive value when
- given a valid user space address and -1 otherwise. In the latter case u.u_error
- is set to EFAULT. USL SysV R4.0.3 has a sign extension bug in the
- implementation of fubyte() for local file descriptors (i.e. not opened via
- RFS), which causes fubyte() to return negative values if the byte fetched has
- its high bit set. This bug doesn't affect STREAMS drivers, as they don't call
- (and in fact are normally unable to call) fubyte(). Thus writing a byte with
- the high bit set to certain character device drivers returns with -1 and errno
- set to EFAULT.
-
- The bug may affect any character device driver that calls fubyte(). It's not
- limited to serial card drivers. The bug is noticed most often with serial card
- drivers, since uucp uses byte values > 127 very early during g-protocol setup
- and drivers for serial cards tend to use fubyte() quite often.
-
- Note also that the bug's effect is different if the driver checks for a -1
- return value of fubyte() or just a negative one. In the former case it is
- possible to pass bytes with the 8 bit set through fubyte(), except for 0xff
- which is -1 in two's complement. That makes the bug more obscure.
-
- The fix is easy. First, make a backup copy of the kernel object file
- /etc/conf/pack.d/kernel/vm.o! A disassembly of vm.o(lfubyte) should reveal
- *exactly* one mov[s]bl (move byte to long w/sign extend). That one needs to be
- patched into a movzbl (zero extend). The difference is one bit in the second
- byte of the opcode.
-
- The movsbl has the bit pattern 00001111 1011111w mod/rm-byte.
- The movzbl has the bit pattern 00001111 1011011w mod/rm-byte.
-
- The 'w' bit is 0 for the instruction in question. So the opcodes are 0f be and
- 0f b6. Here is the diff -c from dis -F lfubyte showing the patch applied to
- the Dell 2.1 kernel:
-
- *** vm.o Mon Mar 9 00:31:38 1992
- --- vm.o.org Mon Mar 9 00:32:40 1992
- ***************
- *** 22,28 ****
- 11c90: 85 c0 testl %eax,%eax
- 11c92: 75 09 jne 0x9 <11c9d>
- 11c94: 8b 45 08 movl 8(%ebp),%eax
- ! 11c97: 0f b6 00 movzbl (%eax),%eax
- 11c9a: 89 45 fc movl %eax,-4(%ebp)
- 11c9d: c7 05 d8 13 00 00 00 00 00 00 movl $0x0,0x13d8
- 11ca7: 83 3d dc 13 00 00 00 cmpl $0x0,0x13dc
- --- 22,28 ----
- 11c90: 85 c0 testl %eax,%eax
- 11c92: 75 09 jne 0x9 <11c9d>
- 11c94: 8b 45 08 movl 8(%ebp),%eax
- ! 11c97: 0f be 00 movsbl (%eax),%eax
- 11c9a: 89 45 fc movl %eax,-4(%ebp)
- 11c9d: c7 05 d8 13 00 00 00 00 00 00 movl $0x0,0x13d8
- 11ca7: 83 3d dc 13 00 00 00 cmpl $0x0,0x13dc
-
- Of course there is a workaround at the driver level. Canonically, one would do
- this by checking for fubyte() returning -1 *and* u.u_error being set to EFAULT
- (u.u_error is cleared upon entering a system call). However, in R4.0.3
- fubyte() does NOT set u.u_error. It *does* set u.u_fault_catch.fc_errno.
-